International Journal of Epidemiology
◐ Oxford University Press (OUP)
Preprints posted in the last 7 days, ranked by how well they match International Journal of Epidemiology's content profile, based on 74 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit.
Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.
Show abstract
Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.
Staples, J. W.; White, S. L.; Giacalone, A.; Pozdeyev, N.; Sammel, M. D.; Stranger, B. E.; Valencia, C. I.; Santoro, N.; Hendricks, A. E.
Show abstract
Objective. Menopause is a significant physiological transition with implications for health outcomes (e.g., cardiometabolic), yet gaps remain in understanding the menopause transition, including how menopause timing and type influence health outcomes. Large-scale cohort studies in midlife (age~40-60) females, including the All of Us Research Program (AoURP), provide opportunities to study menopause across diverse populations and data modalities. We characterized menopause-related data in AoURP, focusing on age distributions and concordance between EHR diagnosis codes and self-reported survey responses. Methods. We analyzed menopause-related survey, EHR diagnostic code, and genomic data among ~396,000 participants in AoURP with female sex. We summarized menopause data across modalities, overlap between survey, EHR, and genomic data, and age distributions overall and across sociodemographic characteristics. Results. Among ~396,000 females, surveys captured ~193,000 menopause observations, nearly seven times more than structured EHR diagnoses (~28,000), suggesting under- ascertainement in EHR data. Nearly all females (~99%) with an EHR menopause diagnosis also reported menopause in the survey. Approximately 22,000 participants had intersected EHR, survey, and genomic menopause-related data. Survey-based age patterns matched expectations, with participants <40 years predominantly reporting pre-menopausal status and those >60 years predominantly reporting post-menopausal status. A small subset (N{approx}1,700; 4%) (age>70 years) reported no menopause, suggesting response or recall bias. EHR menopause codes were concentrated after age>45 years, with a notable spike at age 65. Modest differences in survey-based menopause age distributions were observed by sociodemographic characteristics (e.g., race, ancestry). Conclusions. These findings inform sampling strategies, power calculations, phenotype definition, and study design for menopause research using AoURP.
Bailey, M.; Hammerton, G.; Fairchild, G.; Tsunga, L.; Hoffman, N.; Burd, T.; Shadwell, R.; Danese, A.; Armour, C.; Zar, H. J.; Stein, D. J.; Donald, K. A.; Halligan, S. L.
Show abstract
ObjectiveThere is little longitudinal research investigating links between violence exposure and mental disorders among children in low- and middle-income countries (LMICs), despite high rates of violence. We examined cross-sectional and longitudinal violence-mental health associations among children in a large South African birth cohort, the Drakenstein Child Health Study, including direct clinical interviews capturing childrens mental disorders. MethodIn this birth cohort (N=974), we assessed lifetime violence exposure and four subtypes (witnessed community, community victimization, witnessed domestic, domestic victimization) at ages 4.5 and 8-years via caregiver reports. At 8-years, caregivers completed the Child Behaviour Checklist; and psychiatric disorders were assessed using the Mini-International Neuropsychiatric Interview for Children and Adolescents, a self-report measure. We tested for associations using linear/logistic regressions, adjusted for confounders. ResultsMost children (91%) had experienced violence by 8-years. Cross-sectionally, total violence exposure was associated with total (B =0.49 [95% CI 0.32, 0.66]), internalizing (0.32 [0.17, 0.47]), and externalizing problems (0.46 [0.31, 0.61]), and with increased odds of disorder at 8 years (aOR=1.09 [1.05, 1.13]). Longitudinally, total violence exposure up to 4.5-years was associated with total (B=0.27 [0.03, 0.52]), internalizing (0.24 [0.04. 0.44]), and externalizing scores (0.23 [0.008, 0.45]) at 8-years, but not with increased risk of psychiatric disorders. The strongest and most consistent associations were observed for domestic versus community violence subtypes. ConclusionOur strong cross-sectional but weaker longitudinal findings suggest that recent violence exposures may be more critical than early exposures for childrens mental health. Longitudinal exploration of other violence-affected LMIC populations is urgently needed.
Wang, J.; Morrison, J.
Show abstract
1Mendelian randomization (MR) uses genetic variants as instrumental variables to infer causal relationships between complex traits. Standard MR can be used to estimate an average causal effect at the population level, and typically assumes a linear exposure-outcome relationship. Recently, several methods for estimating nonlinear effects have been developed. However, many have been found to produce spurious empirical findings when subjected to negative control analyses. We propose that this poor performance may be attributable to heterogeneity in variant-exposure associations. We demonstrate that heterogeneous genetic effects on exposure lead to biased estimates, poor coverage, and inflated type I error in control function and stratification-based methods. In contrast, two-stage least squares (TSLS) methods are robust to such heterogeneity, but suffer from low precision and low power in some circumstances. We show that a statistical test for heterogeneity can be used to guide the choice of nonlinear MR methods. Using UK Biobank data, we reassess the causal effects of BMI, vitamin D, and alcohol consumption on blood pressure, lipid, C-reactive protein, and age (negative control). We find strong evidence of heterogeneity for all three exposures, and also recapitulate previous results that control function and stratification-based methods are prone to false positives. Finally, using nonparametric TSLS, we identify evidence of nonlinear causal effects of BMI on HDL cholesterol, triglycerides, and C-reactive protein; however, specific estimates of the shape of these relationships are imprecise. Altogether, our results suggest that common nonlinear MR methods are unreliable in the presence of realistic levels of heterogeneity, and that more methodological development is required before practically useful nonlinear MR is feasible.
Bui, L. V.; Nguyen, D. N.
Show abstract
Background. Vietnam's disease burden has shifted from communicable, maternal, neonatal, and nutritional (CMNN) causes to non-communicable diseases (NCDs), but the tempo, drivers, and regional positioning of this transition have not been jointly quantified. We characterised Vietnam's epidemiological transition 1990-2023 against ten Southeast-Asian (SEA) peers. Methods. Using Global Burden of Disease 2023 data, we computed joinpoint-regression AAPC with 95% CI (BIC-penalised, up to three break-points) for age-standardised DALY rates and cause-composition shares. We applied Das Gupta three-factor decomposition to 1990-2023 absolute DALY change (population-size, age-structure, age-specific-rate effects) and benchmarked Vietnam's NCD share against an SDI-conditional peer trajectory via leave-one-out quadratic regression. Premature mortality was quantified as WHO 30q70 under both broad NCD and strict SDG 3.4.1 definitions, using Chiang II life-table adjustment identically across all eleven countries. Findings. The CMNN age-standardised DALY rate fell from 13,295.9 to 4,022.1 per 100,000 (AAPC -4.63%/year; 95% CI -4.80 to -4.46); the NCD rate fell only from 21,688.2 to 19,282.8 (AAPC -0.37; -0.45 to -0.30). NCD share of total DALYs rose from 52.99% to 70.67% (+17.67 pp; AAPC +1.09). Vietnam ranked fourth of eleven SEA countries in 2023 (up from sixth in 1990) and sat 5.3% above the SDI-expected trajectory. Das Gupta decomposition attributed the +10.63 million NCD DALY increase to population growth (+6.26 M) and ageing (+6.08 M); rate change removed only 1.71 M. Premature NCD mortality fell from 25.02% to 21.80% (broad, 12.9% reduction) and from 22.17% to 19.50% (SDG 3.4.1, 12.0%; Vietnam sixth of eleven) - far short of the SDG 3.4 one-third-reduction target. Interpretation. Vietnam has entered a disability- and ageing-dominated NCD phase. Meeting SDG 3.4 by 2030 requires population-scale primary prevention sized to demographic momentum.
O'Mahony, D. G.; Beasley, J.; Zanti, M.; Dennis, J.; Dutta, D.; Kraft, P.; Kristensen, V.; Chenevix-Trench, G.; Easton, D. F.; Michailidou, K.
Show abstract
Summary statistics fine-mapping methods offer advantages over classical methods, including avoiding data-sharing constraints and improved modelling of correlated variables and sparse effects. However, its performance has not been comprehensively evaluated in breast cancer using real-world data. Previous multinomial stepwise regression (MNR) fine-mapping analyses for breast cancer identified 196 credible sets. Here, we apply summary statistics fine-mapping, compare methods, and assess parameters influencing performance. Using summary statistics from the Breast Cancer Association Consortium, we compared finiMOM, SuSiE, and FINEMAP to published MNR results across 129 regions. Performance was assessed by recall using in-sample and out-of-sample LD. Discordant credible sets were examined for technical factors, and target genes were defined using the INQUISIT pipeline. SuSiE showed the closest agreement with MNR. Results varied across regions depending on the assumed number of causal variants (L), with higher values reducing recall and no single L maximising performance. At optimal L per region, SuSiE identified 8,192 CCVs in 244 credible sets, with recall of 88%, 86%, and 72% for overall, ER-positive, and ER-negative breast cancer. Thirty MNR sets were missed. Discordance was partially explained by allele flips, imputation quality, and array heterogeneity. Fifty-two MNR-identified genes, including BRCA2, WNT7B and CREBBP were not recovered, while additional candidate genes were identified. Using out-of-sample LD reduced recall by 3% but identified novel variants. Fine-mapping results vary across methods, and no single approach is sufficient. The choice of L strongly influences results, and combining analytical approaches with functional validation can improve causal variant identification.
Babalola, C. M.; Medina-Marino, A.; Mdingi, M. M.; Wilson, M. L.; Mukomana, F.; Muzny, C. A.; Taylor, C. M.; Gigi, R. M.; Jung, H.; Low, N.; Peters, R. P.; Klausner, J. D.
Show abstract
BackgroundChlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis are curable sexually transmitted infections (STIs) associated with adverse birth outcomes. Most infections are asymptomatic. Whether antenatal STI screening improves birth outcomes remains uncertain. MethodsIn a randomized three-group trial in South Africa, pregnant women aged 18 years or older were assigned before 27 weeks gestation to: (1) screening and treatment for Chlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis at enrollment, with tests-of-cure (One-Time Screening); (2) screening and treatment at enrollment, repeated at 30 to 34 weeks (Two-Time Screening); or (3) Standard-of-Care (Syndromic management). The primary outcome was a composite of preterm birth (<37 weeks gestation) or low birthweight (<2500 g), analyzed in the modified intention-to-treat population of participants with live births. Components of the composite outcome were evaluated individually as the main secondary outcomes. The study was registered with ClinicalTrials.gov, NCT04446611. FindingsOf 2247 enrolled participants, 1910 had live births. The composite outcome occurred in 22{middle dot}9% of the One-Time Screening group (risk ratio [RR] 0{middle dot}99; 95% confidence interval [CI] 0{middle dot}81-1{middle dot}21), 20{middle dot}6% of the Two-Time Screening group (RR 0{middle dot}89; 95% CI 0{middle dot}72-1{middle dot}09), compared with 23{middle dot}2% of the Standard-of-Care group. Preterm birth occurred in 18{middle dot}9% of the One-Time Screening group (RR 1{middle dot}00; 95% CI 0{middle dot}80-1{middle dot}26), 14{middle dot}5% of the Two-Time Screening group (RR 0{middle dot}77; 95% CI 0{middle dot}60-0{middle dot}99), and 18{middle dot}8% of the Standard-of-Care group. Low birthweight occurred in 14{middle dot}1% of the One-Time Screening group (RR 1{middle dot}10; 95% CI 0{middle dot}83-1{middle dot}46), 12{middle dot}9% of the Two-Time Screening group (RR 1{middle dot}01; 95% CI 0{middle dot}76-1{middle dot}35), and 12{middle dot}8% of the Standard-of-Care group. InterpretationNeither screening strategy for Chlamydia trachomatis, Neisseria gonorrhoeae, and Trichomonas vaginalis reduced the primary composite outcome of preterm birth or low birthweight, or low birthweight alone. The Two-Time antenatal STI screening strategy, however, reduced preterm birth by 23%.
Yao, S.; Zimbalist, A.; Sheng, H.; Fiorica, P.; Cheng, R.; Medicino, L.; Omilian, A.; Zhu, Q.; Roh, J.; Laurent, C.; Lee, V.; Ergas, I.; Iribarren, C.; Rana, J.; Nguyen-Huynh, M.; Rillamas-Sun, E.; Hershman, D.; Ambrosone, C.; Kushi, L.; Greenlee, H.; Kwan, M.
Show abstract
Background: Few studies have examined racioethnic disparities in cardiovascular disease (CVD) in women after breast cancer treatment, who are at higher risk due to cardiotoxic cancer treatment. Methods: Based on the Pathways Heart Study of women with a history of breast cancer, this analysis examines the association between cardiometabolic risk factors (hypertension, diabetes, and dyslipidemia) and CVD events with self-reported race and ethnicity, as well as genetic similarity. Multivariable logistic and Cox proportional hazards regression models were used to test race and ethnicity and genetic similarity with prevalent and incident cardiometabolic risk factors and CVD events. Results: Of the 4,071 patients in this analysis, non-Hispanic Black (NHB), Asian, and Hispanic women were more likely to have prevalent and incident diabetes than non-Hispanic White (NHW) women. Analysis of genetic similarity revealed results consistent with self-reported race and ethnicity. For CVD risk, NHB women were more likely to develop heart failure and cardiomyopathy than NHW women. In contrast, Hispanic women were at lower risk of any incident CVD, serious CVD, arrhythmia, heart failure or cardiomyopathy, and ischemic heart disease, which was consistent with the associations found with Native American ancestry. Conclusions: This is the largest multi-ethnic study of disparities in CVD health in breast cancer survivors, demonstrating corroborating findings between self-reported race and ethnicity and genetic similarity. The results highlight disparities in cardiometabolic risk factors and CVD among breast cancer survivors that warrant more research and clinical attention in these distinct, high-risk populations.
Unegbu, U. L.
Show abstract
Background: Nigeria bears one of the highest maternal mortality burdens globally, with skilled birth attendance (SBA) remaining critically low in many regions. Understanding the independent determinants of SBA is essential for designing targeted interventions. Methods: This cross sectional study analyzed 21,465 births from the 2018 Nigeria Demographic and Health Survey (NDHS), a nationally representative household survey using stratified two stage cluster sampling. SBA was defined as delivery attended by a doctor, nurse, midwife, or auxiliary midwife. Multivariable logistic regression was used to estimate adjusted odds ratios (aOR) with 95% confidence intervals for the associations between SBA and maternal education, household wealth, place of residence, geopolitical region, maternal age, parity, and antenatal care (ANC) utilization, after accounting for confounding. Results: The overall prevalence of SBA was 44.9%. In the fully adjusted model, higher education (aOR = 7.01, 95% CI: 5.68-8.67), richest wealth quintile (aOR = 6.27, 95% CI: 5.27-7.46), and attending [≥]4 ANC visits (aOR = 3.80, 95% CI: 3.51-4.11) were the strongest independent predictors of SBA. Regional inequalities were pronounced, with SBA prevalence ranging from 17.7% in the North West to 85.6% in the South West. Crude effect estimates for education and wealth were substantially attenuated after adjustment, indicating large confounding by correlated socioeconomic factors. Conclusions: Maternal education, household wealth, ANC utilization, and geopolitical region are independent determinants of SBA in Nigeria. Scaling up ANC programs represents the most immediately actionable intervention, while long term gains require investment in girls' education and wealth equity. Targeted strategies for the northern regions are urgently needed. Keywords: skilled birth attendance, maternal mortality, Nigeria, DHS, antenatal care, logistic regression, health equity
Nilsson, A.; da Silva, M.; Le, H. T.; Haggstrom, C.; Wahlstrom, J.; Michaelsson, K.; Trolle Lagerros, Y.; Sandin, S.; Magnusson, P. K.; Fritz, J.; Stocks, T.
Show abstract
Excess body weight has been associated with increased cancer risk, but the role of weight change across adulthood remains unclear. We examined body weight trajectories from ages 17 to 60 and their associations with site-specific cancer incidence. Data were based on the ODDS study, a pooled, nationwide cohort study in Sweden, with data on weight spanning 1911 to 2020, and cancer follow-up through 2023. Weight trajectories were estimated with linear mixed effects models in individuals with at least three weight measurements. Cox regressions estimated hazard ratios for associations between weight trajectories and established and potentially obesity-related cancers. Fifth versus first quintile of weight change was associated with many cancers, most strongly with esophageal adenocarcinoma in men (HR 2.25; 95% CI 1.66-3.04), liver cancer in men (HR 2.67; 95% CI 2.15-3.33), endometrial cancer in women (HR 3.78; 95% CI 3.09-4.61), and pituitary tumors in both sexes (men: HR 3.13 [95% CI 2.13-4.61]; women: HR 2.13 [95% CI 1.41-3.22]). Associations varied by sex and age. Heavier weight at age 17 years and earlier obesity onset were also associated with higher cancer incidence. These findings highlight the importance of a life-course approach to weight management and support sex- and age-targeted cancer prevention strategies.
Robert, A.; Goodfellow, L.; Pellis, L.; van Leeuwen, E.; Edmunds, W. J.; Quilty, B. J.; van Zandvoort, K.; Eggo, R. M.
Show abstract
BackgroundIn England, the burden of respiratory infections varies by ethnicity, contributing to health inequalities, but the role of additional demographic factors remains underexplored. We quantified how differences in social mixing and demographic characteristics between ethnic groups cause inequalities in transmission dynamics. MethodsWe analysed the association between the ethnicity and the number of contacts of 12,484 participants in the 2024-2025 Reconnect social contact survey, using a negative binomial regression model. We simulated respiratory pathogen epidemics using a compartmental model stratified by age, ethnicity, and contact levels, at a national level and in major cities in England. FindingsAfter adjusting for demographic variables, participants of Black and Mixed ethnicities had more contacts than those of White ethnicity (rate ratios (RR): 1.18 [95% Credible Interval (CI): 1.11-1.26], and 1.31 [95% CI: 1.14-1.52]). Participants of Asian ethnicity had fewer contacts (RR: 0.85 [95% CI: 0.79-0.91]). In national-level simulations, individuals of White ethnicity had the lowest attack rates due to demographic differences and mixing patterns. Local demographic structures changed simulated dynamics: attack rates in individuals of Black and Mixed ethnicities were approximately double those of White ethnicity in Birmingham, but less than 60% higher in Liverpool. InterpretationDemographic characteristics and mixing patterns create inequalities in transmission dynamics between ethnicities, while local demographic characteristics and pathogen infectiousness change the expected relative burden. To ensure mitigation strategies are effective and equitable, their evaluation must explicitly account for inequalities arising from local context. FundingMedical Research Council, National Institute for Health and Care Research, Wellcome Trust Research in context Evidence before this studyWe searched PubMed for population-based studies quantifying differences in respiratory infections between ethnic groups, up to 1 April 2026, with no language restrictions. Keywords included: (respiratory pathogens OR influenza OR COVID-19) AND (ethnic* OR race) AND (inequ*) AND (compartmental model OR incidence rate ratio OR hazard ratio). We excluded studies that focused on non-respiratory pathogens (e.g. looking at consequences of COVID-19 on incidence of other pathogens). A population-based cohort study showed that influenza infection risk was higher in South Asian, Black, and Mixed ethnic groups compared to White ethnicity in England. Another population-based cohort study highlighted that during the first wave of COVID-19 in England, the South Asian, Black, and Mixed ethnic groups were more likely to test positive and to be hospitalised than the White ethnic group. Census data in England showed that the distributions of age, household size, household income and employment status differed between ethnic groups, and the recent Reconnect social contact surveys highlighted the impact of each demographic factor on the participants number of contacts. Added value of this studyOur study shows that social contact patterns, mixing, and demographic structure all lead to unequal infection risk between ethnic groups in respiratory pathogen epidemics. Using the largest available social contact survey in England, we show that both the average number of contacts and the proportion of high-contact individuals varied by ethnic group, even after adjusting for participants demographics. These differences, together with mixing patterns and age structure, led to lower expected incidence among individuals of White ethnicity than in all other ethnic groups in simulated outbreaks. The level of inequality between ethnic groups changed when we used different values of pathogen transmissibility. Finally, as ethnic composition and population structure differ between cities in England, our results show differences in expected inequalities at a local level. Implications of all the available evidenceInequalities in infection risk between ethnic groups are context- and pathogen-dependent. They arise from both local population structure and contact patterns. Detailed information on mixing between groups and population structure is needed to accurately measure group-specific infection risk. These findings indicate that public health interventions based only on national-level estimates conceal regional variation in risk and may ultimately increase inequalities. Public health interventions need to be tailored to local contexts to be equitable and effective. Finally, our findings provide a foundation for understanding the progression from infection-risk inequalities to disparities in disease presentation and clinical outcomes.
Gada, L.; Afuleni, M. K.; Noble, M.; House, T.; Finnie, T.
Show abstract
Knowing the mortality rates associated with infection by a pathogen is essential for effective preparedness and response. Here, harnessing the flexibility of a Bayesian approach, we produce an estimate of the Infection Fatality Ratio (IFR) for A(H5N1) conditional on explicit assumptions, and quantify the uncertainty thereof. We also apply the method to first-wave COVID-19 data up to March 2020, demonstrating the estimates that could be obtained were the model available then. Our analysis uses World Development Indicators (WDI) from the World Bank, the A(H5N1) WHO confirmed cases and deaths tracker by country (2003-2024), and COVID-19 cases and deaths data from John Hopkins University (January and February 2020). Since infectious disease dynamics are typically influenced by local socio-economic factors rather than political borders, individual countries are placed within clusters of countries sharing similar WDIs relevant to respiratory viral diseases, with clusters derived by performing Hierarchical Clustering. To estimate the IFR, we fit a Negative Binomial Bayesian Hierarchical Model for A(H5N1) and COVID-19 separately. We explicitly modelled key unobserved parameters with informative priors from expert opinion and literature. By modelling underreporting, our analysis suggests lower fatality (15.3%) compared to WHO's Case Fatality Ratio estimate (54%) on lab-confirmed cases. However, credible intervals are wide ([0.5%, 64.2%] 95% CrI). Therefore, good preparedness for a potential A(H5N1) pandemic implies adopting scenario planning under our central estimate, as well as for IFRs as high as 70%. Our approach also returns a COVID-19 IFR estimate of 2.8% with [2.5%, 3.1%] 95% CrI which is consistent with literature.
Hassell, N.; Marcenac, P.; Bationo, C. S.; Hirve, S.; Tempia, S.; Rolfes, M. A.; Duca, L. M.; Hammond, A.; Wijesinghe, P. R.; Heraud, J.-M.; Pereyaslov, D.; Zhang, W.; Kondor, R. J.; Azziz-Baumgartner, E.
Show abstract
Introduction: Modeling when influenza epidemics typically occur can help countries optimize surveillance, time clinical and public health interventions, and reduce the burden of influenza. Methods: We used influenza virus detections reported during 2011-2024 by 180 countries to the Global Influenza Surveillance and Response System, excluding COVID-19 pandemic impacted years (2020-2023). We analyzed data by calendar year (week 1-52) or shifted year (week 30-29) time windows, based on when most influenza detections occurred in each country. For countries with sufficient data, we computed generalized additive models (GAMs) of each country's weekly influenza-positive tests to smooth and impute time series distributions. From these GAMs, we calculated each country's normalized weekly influenza burden. Country-specific normalized time series were grouped using hierarchical k-means clustering reducing the Euclidean distance between time series within clusters. We calculated cluster-specific GAMs to estimate average seasonal timing. Countries without sufficient data were assigned to a cluster based on population-weighted latitudinal distance to a cluster's mean latitude. Results: We identified five clusters, or epidemic zones, from 111 countries with sufficient data. The influenza burden in epidemic zones A and B was consistent with a northern hemisphere pattern, with most influenza detections occurring during October-April (A) and September-March (B), while epidemic zones D and E were characterized by southern hemisphere-like seasonal timing, with most influenza burden occurring during May-November. Epidemic zone C had most influenza burden occurring during September-March; most countries assigned to this cluster were in the tropics. Conclusion: Epidemic zones may serve as a useful tool to strengthen and optimize influenza surveillance for global health decision-making (e.g., during vaccine strain composition discussions) and to guide country preparedness efforts for seasonal influenza epidemics, including the timing of enhanced surveillance, as well as the procurement and delivery of vaccines and antivirals.
Li, Y.; Cabral, H.; Tripodis, Y.; Ma, J.; Levy, D.; Joehanes, R.; Liu, C.; Lee, J.
Show abstract
Mediation analysis quantifies how an exposure affects an outcome through an intermediate variable. We extend mediation analysis to capture the cumulative effects of longitudinal predictors on longitudinal outcomes. Our proposed model examines how mediators transmit the effects of the current and previous exposure on the current outcome. We construct a least-squared estimator for cumulative indirect effect (CIE) and used three approaches (exact form, delta method, and bootstrap procedure) to estimate its standard error (SE). The estimator of CIE is unbiased with no unmeasured confounding and independent model errors between mediator model and outcome model at all time points, as shown in statistical inference and in simulations. While three SE estimates are numerically similar, bootstrap procedure is recommended due to its simplicity in implementation. We apply this method to Framingham Heart Study offspring cohort to assess if DNA methylation mediates the association of alcohol consumption with systolic blood pressure over two time points. We identify two CpGs (cg05130679 and cg05465916) as mediators and construct a composite DNA methylation score from 11 CpGs, which mediates for 39% of the cumulative effect. In conclusion, we propose an unbiased estimator for CIE. Future studies will investigate the missingness in mediators and outcomes.
RAZAFIMAHATRATRA, S. L.; RASOLOHARIMANANA, L. T.; ANDRIAMARO, T. M.; RANAIVOMANANA, P.; SCHOENHALS, M.
Show abstract
Interpreting serological data remains challenging, particularly in low prevalence or cross reactive contexts, where antibody responses often show substantial overlap between exposed and unexposed individuals and may depart from normal distributional assumptions. Conventional cutoff based approaches often yield inconsistent or biased estimates of seroprevalence. Here, we present a decisional framework based on finite mixture models (FMMs) that enhances the robustness and interpretability of serological analyses. Beyond simply applying mixture models, our framework integrates multiple methodological innovations : (i) systematic comparison of Gaussian and skew normal mixture models to accommodate asymmetric antibody distributions; (ii) rigorous model selection using the Cramer von Mises test (p > 0.01) combined with a parsimonious score (APS) to prioritize models with well separated clusters; and (iii) hierarchical clustering of posterior probabilities to collapse latent components into biologically meaningful seronegative and seropositive groups. Applied to chikungunya virus (CHIKV) data from Bangladesh, the framework produced prevalence estimates consistent with ROC based methods while probabilistically identifying borderline cases. Validation on SARS CoV 2 and dengue datasets further demonstrated its generalizability: for SARS CoV 2, the approach identified up to five latent clusters with high sensitivity (up to 100%) and specificity (up to 100%), enabling discrimination by disease severity. For dengue, it revealed interpretable subgrouping consistent with background exposure and subclinical infection, despite limited confirmed cases. By integrating distributional flexibility, robust goodness of fit testing, and biologically guided cluster consolidation, this decisional FMM framework provides a reproducible and scalable method for serological interpretation across pathogens and epidemiological settings, addressing key limitations of threshold based classification.
Mahmud, S.; Akter, M. S.; Ahamed, B.; Rahman, A. E.; El Arifeen, S.; Hossain, A. T.
Show abstract
Background Depressive symptoms among reproductive-aged women represent a major public health concern in low- and middle-income countries, yet systematic screening remains limited. In most population survey datasets, the low prevalence of depression results in severe class imbalance, which challenges conventional machine learning models. Therefore, we develop and evaluate a bagging-based ensemble machine learning framework to predict depressive symptoms among reproductive-aged women using highly imbalanced Bangladesh demographic and health survey (BDHS) 2022 data. Methods The sample comprised women aged 15-49 years drawn from BDHS 2022 data. Depressive symptoms were defined using the Patient Health Questionnaire (PHQ-9 [≥]10). Candidate predictors were drawn from sociodemographic, reproductive, nutritional, psychosocial, healthcare access, and environmental domains. Feature selection was performed using Elastic Net (EN), Random Forest (RF), and XGBoost model. Five classifiers (EN, RF, Support Vector Machine (SVM), K-nearest neighbors (KNN), and Gradient Boosting Machine (GBM)) were trained using both oversampling-based approaches and the proposed ensemble framework. Model performance was evaluated on an independent test set using accuracy, sensitivity, specificity, F1-score, and the normalized Matthews correlation coefficient (normMCC). Results Approximately 4.8% of women were identified with depressive symptoms. The proposed bagging ensemble framework consistently achieved more balanced predictive performance than oversampling-based models. Average normMCC improved from 0.540 (oversampling) to 0.557 (ensemble). RF and GBM ensembles demonstrated notable improvements in identifying depressive cases, while the EN ensemble achieved the highest overall performance and sensitivity. Threshold optimization yielded stable normMCC across models, indicating robust trade-offs between sensitivity and specificity. Conclusions Bagging-based ensemble learning provides a more robust and balanced approach than synthetic oversampling for predicting depressive symptoms in highly imbalanced population survey data. This approach has important implications for improving early identification and population-level mental health surveillance in resource-constrained settings.
Ihejirika, S. A.; Stephen, E.; Ye, K.
Show abstract
Gene-environment interactions (GEI) contribute to circulating polyunsaturated fatty acid (PUFA) and monounsaturated fatty acid (MUFA) profiles. GEI may partly explain differences in trait variance across genotype groups. To identify GEI for circulating unsaturated fatty acids, we adopted a two-stage strategy. First, we detected quantitative trait loci associated with trait variance (vQTLs). Second, we tested these vQTLs for interaction with fish oil supplements (FOS). We performed genome-wide vQTL screens for 14 plasma PUFA and MUFA phenotypes in a UK Biobank subset of 200,478 participants. At the genome-wide significance threshold (p < 5.0 x 10-8), we identified 172 vQTL-trait pairs across all 14 traits, and 16 of these vQTLs had no marginal genetic effect on the corresponding trait. We found 46 non-overlapping loci across all phenotypes, with an average of 12 vQTLs per trait. Omega-6% and PUFA% had the most independent vQTLs (N = 24) while DHA% and Omega-3% had the least (N = 1 and 2, respectively). For each of the 172 vQTL-trait pairs, we tested the interaction effect of the vQTL with FOS on the corresponding trait. We found six significant interaction signals in DHA, DHA%, Omega-3, Omega-3%, LA, and Omega-6/Omega-3 ratio around the FADS1/2, ZPR1, and SUGP1/TM6SF2 genes. Our results provide a comprehensive resource of vQTLs and gene-FOS interactions shaping the circulating levels of unsaturated fatty acids.
Franzese, F.; Bergmann, M.; Burzynska, A.
Show abstract
Socioeconomic inequalities in health and well-being are a major public health concern, particularly in ageing populations. Education is a key determinant shaping multiple aspects of health outcomes. We used cross-sectional data from wave 9 of the German sample (n=4,148) of the Survey of Health, Ageing and Retirement in Europe (SHARE) to test whether formal education is associated with well-being in later adulthood, with health literacy, self-rated health, and preventive health behaviours as possible mediators. Our results showed that education was positively associated with greater well-being, but only via indirect pathways. Specifically, self-rated health, health literacy, and fruit and vegetable consumption mediated the relationship between education and well-being accounting for 54.7, 24.7, and 12.6 percent of the total effect, respectively. In addition, there were significant positive correlations between education and health literacy, as well as high-intensity physical activity, daily fruit and vegetable consumption, more preventive health check-ups, and less smoking. In contrast, alcohol consumption was more common among those with higher levels of education. All health behaviours and health literacy were correlated directly or indirectly (i.e., mediated by health) with well-being. These findings highlight the importance of examining indirect pathways linking education to well-being in later life. Interventions aimed at improving health literacy and promoting healthy behaviours may help reduce educational inequalities in quality of life among older adults.
Schmidt, C.; Samartsidis, P.; Seaman, S.; Emmanouil, B.; Foster, G.; Reid, L.; Smith, S.; De Angelis, D.
Show abstract
To minimise health disparities, equitable access to medical treatment is paramount. In a pioneering intervention, National Health Service Englands Hepatitis C virus (HCV) programme has implemented country-wide peer support to boost treatment access. Peer support workers (peers) are individuals with relevant lived experience, who promote testing and treatment in marginalised populations underserved by traditional health services. We evaluated the English peers intervention, exploiting its staggered rollout and rich surveillance data between June 2016 and May 2021. Peers increased HCV cases identified by 13{middle dot}9% (95% credible interval (95% CrI) [5{middle dot}3, 21{middle dot}7]), sustained viral responses by 8{middle dot}0% (95% CrI [-4{middle dot}4, 18{middle dot}6]), and drug services referrals by 8{middle dot}8% (95% CrI [-12{middle dot}5, 22{middle dot}6]). The interventions effectiveness was magnified during the first COVID-19 lockdown and individuals supported by peers typically belonged to populations with poor treatment access. Our findings indicate that peers can boost equity in treatment access on a national scale.
Essex, R.; Lim, S.; Jagnoor, J.
Show abstract
BackgroundDrowning remains a major global public health challenge. This study examined whether the timing and trajectories of urbanisation--beyond the current built environment--are associated with subnational drowning mortality. MethodsWe linked satellite-derived measures of built-environment change (GHSL), population crowding (WorldPop), surface water exposure (JRC Global Surface Water), and infrastructure proxies (VIIRS/DMSP nighttime lights) to GBD 2021 drowning mortality estimates across 203 ADM1 regions in 12 countries (2006-2021; 3,248 region-year observations). Temporal predictors captured recent expansion, development "newness" ([≤]10-year built share), acceleration/volatility, and a crowdingxgrowth interaction. We screened predictors using LASSO (10-fold cross-validation) and fitted mixed-effects models with region random intercepts. Distributed-lag models tested temporal precedence and development age, and income-stratified models assessed heterogeneity. ResultsAdding temporal predictors improved fit beyond contemporaneous built-environment measures ({Delta}AIC=177; {Delta}BIC=147). In adjusted models, crowdingxgrowth was strongly positively associated with drowning mortality, and a higher share of recent development was associated with higher mortality. Lag models showed a development age gradient: older built environment was most protective. Associations differed by income group, with several key coefficients reversing sign across strata. DiscussionDrowning mortality appears shaped by development histories as well as present-day conditions, with risk concentrated in rapidly changing, dense settings and the newest built environments. Cross-context heterogeneity suggests mechanisms and prevention priorities are unlikely to be uniform. ConclusionsDevelopment timing and trajectories help explain subnational drowning mortality beyond current built form alone. Prevention and planning should prioritise transition-period safety strategies in newly developing and rapidly densifying areas.